CLAIMS 



Having thus described our invention, what we claim as new and desire to 
secure by Letters Patent is as follows: 

1. A method of executing a linear algebra subroutine, said method comprising: 

for an execution code controlling operation of a floating point unit (FPU) 
performing said linear algebra subroutine execution, unrolling an instruction to 
preload data into a floating point register (FReg) of said FPU, said unrolling 
causing said instruction to load data into said FReg to be inserted into a sequence 
of instructions that execute said linear algebra subroutine on said FPU. 

2. The method of claim 1, wherein said instructions are unrolled repeatedly until 
the data loading reaches a steady state in which a data loading exceeds a data 
consumption. 

3. The method of claim 1, wherein said linear algebra subroutine comprises a 
matrix multiplication operation. 

4. The method of claim 1, wherein said linear algebra subroutine comprises a 
subroutine from a LAPACK (Linear Algebra PACKage). 
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5. The method of claim 4, wherein said LAPACK subroutine comprises a BLAS 
Level 3 LI cache kernel. 



6. An apparatus, comprising: 

a memory to store matrix data to be used for processing in a linear algebra 

program; 

a floating point unit (FPU) to perform said processing; and 

a load/store unit (LSU) to load data to be processed by said FPU, said LSU 

loading said data into a plurality of floating point registers (FRegs), 

wherein matrix data is preloaded into said FRegs prior to being required 

by said FPU. 

7. The apparatus of claim 6, wherein said preloading is achieved by unrolling a 
loading instruction so that a load occurs every cycle until a preload condition has 
been satisfied. 

8. The apparatus of claim 6, wherein said linear algebra program comprises a 
matrix multiplication operation. 

9. The apparatus of claim 6, wherein said linear algebra program comprises a 
subroutine from a LAPACK (Linear Algebra PACKage). 
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10. The apparatus of claim 9, wherein said LAPACK subroutine comprises a 
BLAS Level 3 LI cache kernel. 

11. The apparatus of claim 6, further comprising: 

a compiler to generate an instruction for said preloading. 

12. A signal-bearing medium tangibly embodying a program of machine-readable 
instructions executable by a digital processing apparatus to perform a method of 
executing a linear algebra subroutine, said method comprising: 

for an execution code controlling operation of a floating point unit (FPU) 
performing said linear algebra subroutine execution, unrolling an instruction to 
preload data into a floating point register (FReg) of said FPU, said unrolling 
causing said instruction to load data into said FReg to be inserted into a sequence 
of instructions that execute said linear algebra subroutine on said FPU. 

13. The signal-bearing medium of claim 12, wherein said instruction is unrolled 
repeatedly until the data loading reaches a steady state in which a data loading 
exceeds a data consumption. 

14. The signal-bearing medium of claim 12, wherein said linear algebra program 
comprises a matrix multiplication operation. 
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15. The signal-bearing medium of claim 12, wherein said linear algebra program 
comprises a subroutine from a LAPACK (Linear Algebra PACKage). 

16. The signal-bearing medium of claim 15, wherein said LAPACK subroutine 
comprises a BLAS Level 3 LI cache kernel. 

17. A method of providing a service involving at least one of solving and 
applying a scientific/engineering problem, said method comprising at least one of: 

using a linear algebra software package that computes one or more matrix 
subroutines, wherein said linear algebra software package generates an execution 
code controlling a load/store unit loading data into a floating point register (FReg) 
for a floating point unit (FPU) performing a linear algebra subroutine execution, 
such that, for an execution code controlling operation of said FPU, an instruction 
is unrolled to cause a preloading of data into said FReg; 

providing a consultation for purpose of solving a scientific/engineering 
problem using said linear algebra software package; 

transmitting a result of said linear algebra software package on at least one 
of a network, a signal-bearing medium containing machine-readable data 
representing said result, and a printed version representing said result; and 

receiving a result of said linear algebra software package on at least one of 
a network, a signal-bearing medium containing machine-readable data 
representing said result, and a printed version representing said result. 
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18. The method of claim 17, wherein said linear algebra subroutine comprises a 
subroutine from a LAPACK (Linear Algebra PACKage). 

19. The method of claim 18, wherein said LAPACK subroutine comprises a 
BLAS Level 3 LI cache kernel. 
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